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NETWORKS 
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Brown University 

Importance sampling is a technique that is commonly used to 
speed up Monte Carlo simulation of rare events. However, little is 
known regarding the design of efficient importance sampling algo- 
rithms in the context of queueing networks. The standard approach, 
which simulates the system using an a priori fixed change of measure 
suggested by large deviation analysis, has been shown to fail in even 
the simplest network setting (e.g., a two-node tandem network). 

Exploiting connections between importance sampling, differential 
games, and classical subsolutions of the corresponding Isaacs equa- 
tion, we show how to design and analyze simple and efficient dynamic 
importance sampling schemes for general classes of networks. The 
models used to illustrate the approach include d-node tandem Jack- 
son networks and a two-node network with feedback, and the rare 
events studied are those of large queueing backlogs, including total 
population overflow and the overflow of individual buffers. 

1. Introduction. For more than two decades, there has been a growing of 
interest in importance sampling, a method in which the system is simulated 
under a different probability distribution (i.e., change of measure), for fast 
simulation of rare events in queueing networks [2, 11]. 

The standard approach to importance sampling for queueing considers an 
a priori fixed and static change of measure that is suggested by large de- 
viation analysis. This approach works well for simulating large buildups of 
a single/multiple server queue [1, 14]. However, there has been limited suc- 
cess in extending this standard heuristic to networks of queues. In even the 
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simplest network setting, such as a two-node tandem Jackson network, the 
change of measure suggested by the standard heuristic fails to be asymp- 
totically optimal in general [10, 13] and can lead to importance sampling 
estimators with infinite variance [3] . This failure is in fact due to the discon- 
tinuities of the state dynamics on the boundaries of the state space. Such 
discontinuities are not present in the case of a single queue. 

The purpose of the present paper is to present a framework under which 
one can systematically build efficient dynamic (i.e., state dependent) im- 
portance sampling schemes for simulating rare events in queueing networks. 
Our method heavily exploits a recently discovered connection between im- 
portance sampling and deterministic differential games [7, 8] and the role of 
classical subsolutions of the Isaacs equation associated with the game [9]. 
We demonstrate that one can construct classical subsolutions, as the molli- 
fication of the minimum of affine functions, that lead to simple and efficient 
importance sampling schemes. 

To illustrate the main idea, we focus in much of the paper on two-node 
tandem Jackson queueing networks. The rare events of interest are various 
types of buffer overflows, including total population overflow and individual 
buffer overflows. Also discussed are extensions to d-node tandem Jackson 
networks (Section 4) and a two-node Jackson network with feedback (Sec- 
tion 5). We wish to point out that our approach can be applied to general 
Jackson networks and networks with more general arrival/service processes 
(such as Markov modulated processes), and such results will be reported 
elsewhere. To the best of our knowledge, the present paper is the first to 
provide a rigorous theoretical framework in which one can build asymptot- 
ically optimal importance sampling algorithms for rare events in networks 
of queues. 

The paper is organized as follows. Section 2 gives a brief review of the 
basics of importance sampling. In Section 3, we study in detail the classical 
problem of total population overflow in two-node tandem Jackson networks. 
Extensions are discussed in Sections 4 and 5. To ease exposition, most proofs 
are collected in the Appendices. 

2. Basics of importance sampling. The basic idea of importance sam- 
pling is to use a change of measure, that is, the system is simulated under a 
different probability distribution and the outcomes are multiplied by appro- 
priate likelihood ratios (i.e., Radon-Nikodym derivatives) to form unbiased 
estimators. 

We specialize to the estimation of rare event probabilities and consider a 
family of events {A n } in a probability space (Q,J-,¥) such that 

lim--logP(^„) = 7>0. 

n n 
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In order to estimate ¥(A n ), importance sampling generates samples under a 
probability measure Q such that P <C Q, and forms an estimator by averaging 
independent replications of 



where dF/dQ is the Radon-Nikodym derivative. It is easy to check that 
this importance sampling estimator is unbiased. Its rate of convergence is 
determined by the variance of p n . The smaller the variance, the faster the 
convergence. Thanks to the unbiasedness of p n , minimizing the variance 
amounts to minimizing the second moment, which is 



We say the importance sampling estimator is asymptotically optimal if 



Sometimes 2j is referred to simply as the "optimal decay rate." 

Remark 2.1. The requirement of P<CQ is more stringent than neces- 
sary. It is sufficient that P be absolutely continuous with respect to Q on a 
sub-cr-algebra that contains A n , in which case the likelihood ratio is defined 
as the Radon-Nikodym derivative of P and Q when they are restricted on 
this sub-cr-algebra. In this paper, the changes of measure will be applied to 
a sequence of i.i.d. random variables {Y(k)}, and will be restricted to the 
cr-algebra generated by {Y(k)} up until the time either the buffer overflow 
happens or the system returns to the empty state. Note that when consid- 
ered on the full cr-algebra generated by {Y(k)}, it is typical that P is singular 
with respect to Q. 

3. Two-node tandem Jackson networks. To illustrate the main idea of 
the game/subsolution approach toward importance sampling, we specialize 
to two-node Jackson tandem queueing networks, where the arrival process 
is Poisson with rate A and the service times are distributed exponentially 
with rates /ii and ^2, respectively (see Figure 1). The system is assumed to 
be stable, that is, A < minj^i,/^}- 

Suppose that the two queues share one buffer with capacity n, and that 
we are interested in the overflow probability 

p n = Pjnetwork total population reaches n before returning to 0, 





lim sup log E^ [p n ] < lim sup log E^ \p n ] = 27. 



n n n n 



lim inf log E \p n ] > 27. 



starting from 0}. 
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This overflow problem was among the first to be studied in the literature 
on importance sampling for networks, and has served as a benchmark since 
then [13]. 

Rescaling the time variable will have no effect on p n , and so without loss 
of generality we assume A + Mi + ^2 — 1- Since exchanging the order of service 
rates does not affect this probability [16], we further assume that \i2 < fJ-i- 
Under these conditions, we have the large deviation limit [10] 

(3.1) lim logp n = log^ = 7. 

3.1. The standard heuristic. Based on a heuristic application of large 
deviation analysis, Parekh and Walrand [13] proposed a state-independent 
importance sampling algorithm for estimating p n , which amounted to in- 
terchanging the arrival rate and the smallest service rate. For this scheme, 
numerical experiments suggested good performance for a certain range of 
parameters [13]. 

A rigorous analysis of this importance sampling algorithm first appeared 
in [10], in which the authors showed that the algorithm is asymptotically 
optimal when the parameters fall into certain subset. However, it was also 
shown that the asymptotic optimality fails in general, such as when the two 
service rates are nearly equal and the arrival rate is small. A recent paper [3] 
extended these results and showed that this scheme can lead to estimators 
with infinite variance for certain values of parameters. Additional discussion 
on importance sampling for queueing networks can be found in [11]. 

3.2. The system dynamics. The system state can be described by the em- 
bedded discrete time Markov chain Z = {Z(k) = (Zi(k), ^(/c)) : k = 0, 1,2, . . .} 
defined on a probability space (f2, J 7 , P), where Zi(k) is the queue length at 
node i after the kth. transition epoch of the network. 

At times when both queues are nonempty, the increments of the Markov 
chain Z take values in the space 

V={«o = (l,0), Ui = (-l,l), U2 = (0,-l)}, 

with corresponding to an arrival and Vi to a service at node i for i = 1,2. 
On the boundary where either queue is empty, the dynamics exhibit different 
behaviors. Suppose that the queue at node i (i = 1,2) is empty. Then it 
is impossible for the process Z to have increment Vi since it will lead to 




Fig. 1. Two-node tandem, Jackson network. 
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Fig. 2. The system dynamics. 



negative queue size. One way to describe this discontinuity is to allow Z 
to make fictitious jumps of size Vi on the boundary, but they have to be 
accounted for by "pushing back" the state along the direction of constraints 

di = -Vi, 

so that the state process Z stays nonnegative. See Figure 2. 

To summarize, the evolution of the Markov chain Z can be modeled by 
equation 

(3.2) Z(k + 1) = Z(k) + 7t[Z(k),Y(k + 1)], 

where Y = {Y(k) : k > 1} is a sequence of random variables taking values in 
the space V, and the mapping ir is defined for every z = (z\,Z2) € M.+ and 
y € V as 

,„ „, r i _^ / 0, if Zi = and y = v% for some i = 1, 2, 

\y, otherwise. 

The distribution of Z is completely determined by that of the sequence 
Y = {Y(k)}. Define 

7 7+ (V) = {6 = (0o, 61,62) '■ 6 is a probability measure on V 

and di = 6[vi] > for every i = 0, 1,2}. 

Under the (true) probability measure P, Y is a sequence of independent 
identically distributed (i.i.d.) random variables with distribution 

@ = (X,^,fi 2 )eV + (Y). 
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3.3. The dynamic importance sampling algorithms. The importance sam- 
pling schemes we consider use state-dependent changes of measure that 
can be characterized by stochastic kernels n [-|-] on V given that is, 
@ n [-\x] G P+(V) for every i£l 2 + . 

To be more precise, for a given threshold n, define the scaled state process 
X n = Z/n, where Z is defined as in (3.2). Since the definition of it implies 
ir[nx,y] = ir[x,y] for every x G X n satisfies the equation 



(3.4) 



1 



X n (k + 1) = X n (k) + -ir[X n (k),Y(k + 1)], 



with initial condition X n (0) = Z(0)/n = 0. The importance sampling gener- 
ates {Y(k)} as follows. The conditional probability of Y(k + 1) = V{, given 
{Y(j):j = 1,2,... ,k}, is just O n [vi\X n (k)} for each i = 0,1,2. 
Define the hitting times 

T n = inf{k > : X?(k) + X$(k) = 1}, 

T = inf{k > 1 : X™(k) = X%(k) = 0}. 

Let A n be the event of interest, that is, 

A n = {X™ + X2 reaches 1 before returning to 0} = {T n < To}. 

The importance sampling estimator is just 

T n -1 

(3.5) pn=u n - n 



@[Y(k + l)] 



k=0 



Q n [Y{k + l)\X n (k)\ 



The second moment of p n , thanks to (2.1), equals i? p [p n ]. The goal is 
to choose a stochastic kernel G n so that this second moment (whence the 
variance of p n ) is as small as possible. Another important consideration is 
that one would like G n to be simple and easy to implement. 

3.4. Notation and terminology. Before we proceed to construct impor- 
tance sampling algorithms, we collect in this section some notation and 
terminology. Define 

D = {(xi,x 2 ):xi >0,xi+x 2 < 1}, 
D = {(xi,x 2 ) :xi > 0,xi + x 2 < 1}, 
d 1 = {(0,x 2 ):0<x 2 <l}, 
d 2 = {{x 1 ,0):0<x 1 <l}, 
d e = {(xi,x 2 ):xi > 0,xi + x 2 = 1}, 
D n = Dn{(z 1 ,z 2 )/n:(z 1 ,z 2 )£Z 2 + }, 

D n = Dn{(zi,z 2 )/n:(z 1 ,z 2 ) 
Sometimes we refer to d e as the "exit boundary." 
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Remark 3.1. Relative entropy representation for exponential integrals. 
Let (S, J-) be a measurable space and /:5->Ma bounded measurable func- 
tion. Denote by V{S) the space of probability measures on (S,J-). Then for 
any 7 G V(S), 

-log / e~ f dj = inf R(e\h)+ [ f d9 . 
Js eev(s)[ Js 

Furthermore, the minimizer of the right-hand side exists and is mutually 
absolutely continuous with respect to 7. Here the relative entropy is 
defined as 

R(o\\i) = \l s log % de ' if ' <<7 ' 

[ 00, otherwise. 
We refer the readers to [4], Proposition 1.4.2, for the proof. 

3.5. The Isaacs equation. In this section we formally derive the Isaacs 
equation associated with the limit differential game that lies underneath 
importance sampling algorithms. A rigorous argument, though possible, is 
not necessary for our purpose. 

Recall our goal is to choose a stochastic kernel @ n so as to keep the second 
moment i5^ p [pn] as small as possible. We can think of this as a stochastic 
control problem and write down the corresponding dynamic programming 
equation (DPE). To this end, we extend the dynamics and let, for every 

x e D n , 

' y e[Y(k + i)} 

A - l = l e-[Y(k + l)\Xn(k)] 



V n (x)=mfEl[p n ]=miEl 



where p n is defined in exactly the same fashion as in Section 3.3 and E% 
denotes expected value conditioned on X n (0) = x. 

For simplicity, we further assume that x € D n , whence tt[x, y] = y for 
every y € V. Under the original probability measure IP, the sequence {Y(k)} 
is i.i.d. with distribution 0. Hence the DPE 

V n {x) = _ inf V V n (x + - Vi ) • Q[vi] 
eeP+(v)^ V n J Q[vi] 

holds. Consider a logarithmic transform of V n and define 

W n {x) = --\ogV n (x). 
n 

We have 

2 — 

W n {x)= sup --log^2exp{-nW n (x + -v^) - log ^j^jW^]. 
e&p+(Y) n I V n J G[vi]i 
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Applying the relative entropy representation for exponential integrals (see 
Remark 3.1) to the right-hand side of the last equation, it follows that 

2 



W n (x) = sup inf 



W n ( x + - Vi 



E^Niog|f^} + ^l|e) 



n \f^ @[n 

Note that taking infimum over 6 € V + (Y) is equivalent to taking infimum 
over 9 £ V(¥) since by Remark 3.1 the minimizing 6 is mutually absolutely 
continuous to ©, whence it belongs to V + (V). 

Suppose for now that W n (x) converges to W(x). Let DW be the gradient 
of W and formally assume the approximation 

wJx + -v t )-W n (x)*i-(DW(x),Vi). 
\ n J n 

Observing = 1, we arrive at 



2 — 

(DW(x),¥(e)) + log + R(6\\G) 



(3.6) = sup inf 
where 

2 

(3.7) ¥(6) = Y l 0[vi]-v i 

i=0 

for each 6 6 V + (Y). Equation (3.6) is called an Isaacs equation. 

We now discuss the boundary conditions. For the exit boundary, we have 
by definition V n {x) = 1 or W n {x) = 0, therefore we impose the Dirichlet 
boundary condition 

(3.8) W{x) = for x G d e . 

For d\ and 82, we impose the Neumann boundary condition that is typically 
associated with constrained dynamics [12] 

(3.9) (DW(x),di) =0 forxedi. 

Finally, we make a few remarks on the game interpretation of importance 
sampling. The Isaacs equation (3.6) indicates that the underlying game has 
two players. The player who chooses the change of measure in order to 
minimize the second moment (i.e., 0) becomes the maximizing player in the 
game due to the negative sign in the logarithmic transform. The minimizing 
player is artificially introduced, and chooses 6. We will refer to this player 
as the "large deviation player." The dynamics of the game are completely 
determined by 6, or the choice of the large deviation player, while the running 
cost of the game depends on the choices of both players. 
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3.6. The properties of the Hamiltonian. Our construction of importance 
sampling algorithms is based on classical subsolutions to the Isaacs equation. 
Therefore it is useful to study the properties of this equation. Define for each 

pel 2 



(3.10) H(p)= sup inf 
eep+ryseep+C 



©k 



<p,F(0)> +£0k]log^ + JB(0||e) 
1=0 u ™ 



The function H is called the Hamiltonian, and the Isaacs equation (3.6) can 
be written as 

(3.11) M(DW) = 0. 

We have the following result, whose proof is straightforward and therefore 
omitted. 

Proposition 3.2. Let H be defined as in (3.10). 

1. For each p= (pi,P2) € K 2 , £/iere exisis a saddle point {&* (p) , 0* (p)) £ 
7> + (V) x 7>+(V) #roen fry 

@*(p)=e*(p) = N(p)(Xe- pi / 2 ,fi 1 e {pi - p2)/2 ,fi2e P2/2 ), 

where 

N(p) = [Ae- pi / 2 + /V" 2 )/ 2 + M 2 e P2 / 2 ] -1 . 

/re particular, the order of sup and inf can be exchanged in (3.10). 

2. H is concave and has the representation 

M{p)= g inf [(p,F(0)) + 222(0||0)] =21ogJV(p). 

For any pGK 2 ,we will refer to @*(p) as the (saddle point) change of measure 
corresponding to p. 

Figure 3 is a picture of the zero-level curve of H. Recall that 7, as defined 
in (3.1), equals log(/i2/A). 



3.7. T/ie solution to the Isaacs equation. It is well known that viscosity 
solutions provide physically meaningful solutions to equations such as (3.11). 
However, viscosity solutions to the Isaacs equation (3.11) and boundary con- 
ditions (3.8) and (3.9), which are only weak-sense solutions, are not suitable 
for the purpose of constructing efficient importance sampling algorithms for 
this tandem Jackson network. 

More precisely, consider the very simple, affine function 



W s (x) = (n,x) + 2 7 . 
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n = 2y(-i,-l) 
r 2 = 2y(-l,0) 
r» = (0,0) 

f, = 21og(AtiA)(-l,0) 



Fig. 3. Hamiltonian H. 



W s is a viscosity solution to the Isaacs equation (3.11) and boundary condi- 
tions (3.8) and (3.9). Even though W s (0) equals the optimal decay rate 27, 
the corresponding saddle point change of measure, by Proposition 3.2, is 



which is exactly the state-independent change of measure based on standard 
heuristic (i.e., switching the arrival rate and the smallest service rate) and 
therefore inefficient in general. 

As remarked previously, the failure of the importance sampling based on 
W s is due to the fact that W s is only a weak-sense viscosity solution. It is 
not a classical solution (or even a classical subsolution as defined in the next 
subsection), since on the boundary 82 



In a sense that we will make precise later on, this inequality is in the "wrong" 
direction, which suggests that the (artificial) large deviation player, who 
determines the game dynamics, may be able to exploit this "bad" bound- 
ary to a degree that the importance sampling estimator based on W s be- 
comes inefficient. It is not coincidental, as observed in [10], that the ineffi- 
ciency is because a sample path can spend a significant amount of time near 
the boundary 82 before leaving domain D and thereby accumulate a huge 
Radon-Nikodym derivative. 

3.8. Subsolutions and importance sampling schemes. The idea of [9] is 
that classical subsolutions to Isaacs equations can be used to construct ef- 
ficient importance sampling schemes. It has advantages over solution-based 
importance sampling schemes in simplicity, greater flexibility, and general 
applicability. The goal of this section is to construct classical subsolutions 



e*(£w s ) = e*(n) = (At2,Mi,A) 



(DW s ,d 2 ) = (r 1 ,d 2 ) 



2 7 <0. 
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and identify the corresponding changes of measure. As in [9], the construc- 
tion is divided into two steps. We first identify a piecewise smooth subso- 
lution as the minimum of affine functions and then mollify it to obtain a 
classical subsolution. 



Definition 3.3. A function W : D — > R is a classical subsolution to the 
Isaacs equation (3.11) and boundary conditions (3.8) and (3.9) if: 

1. W is continuously differentiable, 

2. M(DW(x)) > for every x£D, 

3. W(x) <0 for xed e , 

4. (DW{x),di) > for x e d h i = 1,2. 

3.8.1. Construction of piecewise affine subsolutions. We will need a piece- 
wise affine subsolution W with the following properties (see Figure 4). 

1. W can be written as W = W\ A W<i A W3 where Wk is an affine function 
for each k = 1, 2, 3. 

2. D is divided into three regions R±, R2 and R3, such that in each region 
R k , W = W k . 

3. The subsolution property M(DW(x)) = M(DWk{x)) > holds for every 
x in the interior of region R^. 

4. The Dirichlet boundary inequality W{x) < holds for x G <9 e - 

5. The Neumann boundary inequality {DW(x),di) > holds whenever x 6 
di and DW(x) exists. 

One such subsolution can be constructed as follows. Fix an arbitrary 5 > 
and let, for each k, 

(3.12) W£(x) = (r k ,x)+2 1 -k5, 
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where the r^'s are depicted in Figure 3. It is not difficult to check that 

W 5 = Wf A Wi A Wl 
satisfies all the requirements for all small 5 > 0. 

Remark 3.4. The failure of the boundary inequality along the x\ axis 
for W s (see Section 3.7) corresponds to the existence of a boundary layer 
in the prelimit which vanishes in the limit. It is for this reason that we 
introduce W^, which perturbs the gradient in a neighborhood of this axis. 
A similar perturbation is not required along the X2 axis, since the bound- 
ary inequality already holds there. w£ is introduced to ensure that both 
boundary conditions hold in a neighborhood of the origin. 

3.8.2. Mollification. To mollify the piecewise affine subsolution W s , we 
will adopt a mollification called exponential weighting that is specialized to 
the minimum of a finite set of smooth functions. For future reference, we 
describe the mollification in its general form. 

Consider continuously differentiable functions {h\, h,2, ■ • ■ , hx}, and let 

h = hi A/12 A •• ■ A h K . 
Fix a small positive number e and define the mollification 

h £ {x) = -elog^exp|--/i fe (x) j. 

We have the following result, whose proof is straightforward and can be 
found in [9], Section 3.3. 

Lemma 3.5. For any e> 0, h £ is continuously differentiable with 

K 

Dh £ (x) = J2pk(x)Dh k (x), 
k=i 

where 

. exp{-hi(x)/e} 

Pi{X) = . 

Efc=iexp{-/i fc (x)/e} 
Furthermore, we have the uniform bounds 

-Ke<h e (x)-h(x)<0 

for every x. 

Note that p e (x) = (pf (x), jo|(x), . . . ,p £ K {x)) defines a probability vector in 
the sense that p%(x) > and 

fc=l 
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3.8.3. The classical subsolution. Applying this mollification to W s , we 
define 

(3.13) W e > 5 {x) = -elog ^exp|--Wif(x)}. 

Thanks to Lemma 3.5, W £ ' 5 is continuously differentiable with 

(3-14) DW £ ' 5 (x) = J2Pk 6 (v)rk, 

k=i 

where 



(3.15) pf°(x) 



exp{-W l 5 (x)/e} 
Zl =1 eM-W£(x)/e} 



We should notice that with this mollification, the function W e,s is not pre- 
cisely a classical subsolution, but only approximately. Indeed, Lemma B.l 
states that the Neumann boundary conditions are only satisfied approxi- 
mately in the sense that, for x £ <9j, 

(DW E ' s (x),di)>-s 

for some small positive number e as long as e/5 is chosen small. The rea- 
son for this violation of the subsolution property is that the exponential 
weighting is not a "local" smoothing. However, the advantages of the expo- 
nential weighting (especially the analytical tractability) outweigh the minor 
additional complications in the analysis introduced by this error. 

3.8.4. The importance sampling estimator and its asymptotics. For each 
k, let @* k be the saddle point change of measure that corresponds to the 
affine function W k , that is, 

@* k = @*(DW k ) = e*{r k )eP + (Y), 

where &*(■) is as defined in Proposition 3.2. Straightforward calculation 
yields that 



©l = (/i2,^i, A), ©2 = 77 — rTTT^O^A^i,/^), 03 = (A,/ii,jU 2 ). 



1 

A/il + 2fl% 

The change of measure based on the W e ' S is just a state-dependent mix- 
ture of Q k . More precisely, define a stochastic kernel e,l5 [-|-] by 

(3.16) e £ '^\x] = j2p £ k 5 (xm(.)eV + (Y), 

k=l 
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and for each fixed n, let 

(3.17) e n [-\-] = @ £ ' s [.|-]. 

In other words, the importance sampling algorithm simulates Y(k + 1), con- 
ditional on the sample history {Y(j) : 1 < j < k}, from the distribution 
B E > s [-\X n (k)], where X n is the state process as defined in (3.4). The im- 
portance sampling estimator p n is then given by (3.5). 

We have the following result regarding its asymptotic performance, whose 
proof is deferred to Appendix B. 

Theorem 3.6. There exist a pair of positive constants (A,B) that only 
depend on the system parameters (A,/xi,/Z2) such that, provided e/5 < B, the 
second moment of the importance sampling estimator p n satisfies 

liminf \og[2nd moment of p n ] >2-f — F(e, 5), 

where 

F(e,5) = 3e + 35 + Aexp{-5/e}. 

Since 27 is the optimal decay rate, the theorem suggest that the impor- 
tance sampling scheme is nearly asymptotically optimal as long as F(e,5) 
is small. This can be achieved if one sets both 5 and e/5 small. 

Remark 3.7. The formula of F also provides an interesting relation 
between e and 5. For each fixed small e, F(e, •) is minimized at 

5 = — e log e + e log ^ w — e log e. 

This suggests that a good strategy is to set 5 = — eloge. Note that in this 
case, when e is small, so are 5 and F(e,5). 

3.8.5. Asymptotic optimality. The previous section provides a nearly 
asymptotically optimal importance sampling algorithm. It is good enough 
for many practical purposes where n is large but not exceedingly large. How- 
ever, one would still like to see an algorithm that gives optimality. This only 
requires a slight modification. 

Instead of using a fixed pair of parameters e and 5 for all re, we now allow 
them to vary depending on re and denote them by e n and 5 n . For each re, 
we use the change of measure based on W e "' Sn , which amounts to letting 

3 

(3.18) Q n [-\x] = e en ' Sn [-\x] = Y,Pk l ' 5n ( x )®U-) €^ + (V). 

fc=i 

Abusing the notation a bit, we again denote by p n the corresponding im- 
portance sampling estimator. 
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Theorem 3.8. The estimator p n is asymptotically optimal, that is, 
lim \og\2nd moment of p n ] =2-~f, 

n n 

provided that S n — > 0, e n /5 n — > and ne n — > oo. 

Remark 3.7 suggests that a good choice is to set 5 n = —e n \oge n . In this 
case, asymptotic optimality follows if e n —> and ne n — » oo. 

3.8.6. Further remarks on the importance sampling algorithms. The com- 
putation of the weights {p 6 ^} or {p e k n,Sn } is very simple. As a consequence, 
the dynamic importance sampling algorithms based on (3.16), (3.17) or 
(3.18) are practically as fast as the standard heuristic scheme where a con- 
stant change of measure is used. 

It is possible that one can associate other changes of measure with subso- 
lutions. For example, one can define 9 n [-|x] = G*{DW £ > 5 (x)) in lieu of (3.16) 
and (3.17), and the resulting algorithms will have similar asymptotic perfor- 
mance. However, the use of mixtures such as (3.16) is computationally more 
convenient. This is especially the case when the change of measure that cor- 
responds to a particular gradient is not easily obtainable. For example, for a 
system with Markov modulated arrival and service rates, the computation of 
the change of measure corresponding to a single gradient p requires solving 
an eigenvalue/eigenvector problem. If we smooth first and then compute the 
change of measure suitable for each point x, then many such problems must 
be solved. In contrast, mixtures like (3.16) only require the computation of 
the changes of measure that correspond to the finite collection of vectors r k ■ 

3.9. Numerical results. In this section we present some numerical results 
for the case where A = 0.1, fi\ = fj,2 = 0.45. For comparison, the theoretical 
value of p n is obtained by iteratively solving the linear system of equations 
that characterize this probability, an approach that is feasible when the 
system is sufficiently small. Note that in this case, the standard heuristic 
importance sampling scheme leads to estimators with infinite variance [3]. 

In the simulations, we always set 5 = — eloge. This choice was suggested 
by Remark 3.7, and was experimentally observed to be a good choice for 
small e. We ran simulations for n = 20, with e = 0.01, 0.02 and 0.03, respec- 
tively. For each e we present two estimates and each estimate consists of 
20,000 replications. The theoretical is p n = 6.0 x 10 -12 (see Table 1). 

In all the tables, "Std. Err." stands for "standard error" and "C.I." for 
"confidence interval." The performance of the dynamic importance sampling 
schemes based on subsolutions is stable across different simulations, with 
good estimates and small standard errors. 

In Table 2 there are more simulation results with n = 30,40,50, with 
e = 0.02 and 5 = —eloge. Each estimate consists of 20,000 replications. 
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Remark 3.9. It is not difficult to check that the "thickness" or the 
height of the boundary region R2 in Figure 4 is 5/(2^). Since we are scaling 
the queue sizes by a factor n, the thickness of the boundary region in the 
prelimit will be n5/(2j) when unsealed. However, the optimality conditions 
ne n — > 00 and £ n /<5 n — > in Theorem 3.8 imply that n5 n — > 00. This does 
not allow the boundary region to be too thin in the prelimit. The need for 
such control is supported by experimentation, which shows that for a fixed 
n, the simulation results tend to deteriorate when e is too small. 

4. Extensions to <i-node tandem Jackson networks. The work on the 
two-node tandem Jackson network can be easily extended to d-node tan- 
dem Jackson networks and more general exit probabilities. To be more 
precise, consider a d-node tandem Jackson network with Poisson arrival 
rate A and consecutive exponential service rates fii, . . . , fid- The state of 
the network is described by the embedded Markov chain Z = {Z(k)} = 
{(Zi(k), . . . , Zd(k))}, where Zi denotes the queue length at node i. The sys- 
tem is assumed to be stable, that is, A < min{/ii, . . . ,/J,d}- Let T be a closed 
subset of such that ^ F and the closure of \ F is compact. We are 
interested in the following rare-event probability: 

p n = P{Process Z hits set nF before returning to 0, starting from 0}. 



Table 1 

IS based on subsolutions, two-node tandem, total population overflow 





e = 


0.01 


e = 


: 0.02 


e = 0.03 




No. 1 


No. 2 


No. 1 


No. 2 


No. 1 No. 2 


Estimate (xlCT 12 ) 
Std. Err. (xl(T 12 ) 
95% C.I. (xl(T 12 ) 


5.7 
0.4 
[4.9, 6.4] 


5.5 
0.3 
[4.9, 6.1] 


5.8 
0.3 
[5.2, 6.4] 


6.1 
0.5 
[5.1, 7.1] 


6.3 5.8 
0.4 0.2 
[5.5, 7.1] [5.3, 6.3] 


IS based on 


Table 2 
subsolutions, two-node tandem, 


total population overflow 




n = 


-- 30 


n 


= 40 


n = 50 


Theoretical value 
Estimate 
Std. Err. 
95% C.I. 


2.63 x 10 -18 
2.73 x 10~ 18 
0.18 x 10~ 18 
[2.37,3.09] x 10~ 18 


1.03 x 10 -24 
1.05 x 10~ 24 
0.03 x 10~ 24 
[0.99,1.11] x 10" 24 


3.80 x 10~ 31 
3.75 x 10~ 31 
0.16 x 10" 31 
[3.43,4.07] x 10" 31 
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Without loss of generality, we assume that A + {i\ + • • • + fxj, = 1. We also 
assume that p n decays exponentially with 

lim log p n = 7. 

n n 

4.1. Isaacs equation and the Hamiltonian. The increments of Z take 
values in V = {vq, v \ , . . . , Vd} where the v^s are (i-dimensional vectors defined 
by 

Vq = (1,0,..., 0), Nj = sl, if j = i + 1 and j<d, 

[ 0, otherwise. 

Similarly to (3.4), the scaled state process X n = Z/n satisfies 

X n {k + 1) = X n (k) + —ir[X n (k),Y(k + 1)], 
n 

where n plays the same role as in (3.3). The sequence {Y(k)} consists of 
i.i.d. random variables taking values in V with common distribution 

e = (A, Wl ... lW )eP + (¥). 

Define the regions 

D = {x E :x (£ T, Xi > 0, i = 1, . . . , d}, 

di = {x e R+ :x <£ T, Xi = 0}, i = 1, . . . , d, 

and the directions of constraints di = —V{. 

The Isaacs equation is just M(DW) = 0, where 

H(p) = sup inf 

with 

d 

¥(e) = ^2e[vi]-vi 

8=0 

for every 6 G V + (Y). The boundary conditions are W(x) = for x € T and 
(DW(x),di) = for xedi. 

The following result is an extension of Proposition 3.2, whose proof is 
very similar and thus omitted. 

Proposition 4.1. For every p £ W d , there exists a saddle point for the 
Hamiltonian M, say (8*(p),0*(p)) G?+(¥) x V + (V), given by 

e*(p)[v i ] = 9*(p)[v t ]=N(p)-e[v i ]exp{-(p,v i )/2}, 



( p ,w(6))+J2ehnog^p\+R(e\\e) 
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where 



N(p) 



J2@[vi]exp{-{p,Vi}/2} 



i=0 



Moreover, the Hamiltonian H is concave and H(p) = 21ogiV(p). 

4.2. Subsolutions and importance sampling schemes. The construction 
of subsolution proceeds in an analogous fashion: we start with a piecewise 
smooth subsolution and then mollify it by exponential weighting. We will 
discuss the general case where the subsolutions can vary depending on n. To 
be more specific, let (WJ 1 , ■ • ■ , W^) be smooth functions (preferably affine 
functions) and let 

W n = Wi A • • • A W%. 
The choice of {WjJ 1 } should have the following properties: 

1. W(DWf}(x)) > for every x € D and every k, 

2. W n (x) < for every iET, 

3. for x on boundary <9j, (DW n (x),di) > when Z)VF n (x) is well defined. 

The quantities W £n,n (x), p e i n " n {x) and the stochastic kernel n are defined 
in a fashion exactly analogous to (3.13), (3.15), (3.16) and (3.17). We denote 
by p n the corresponding importance sampling estimator. 

The following result is an extension of Theorem 3.8. The proof is very 
similar and thus omitted. 



Theorem 4.2. Assume that {W£(x)} has uniformly bounded first and 
second derivatives for x € D and that there exists e n > such that for x £ d{, 
(DW £n ' n (x),di) > -e n . We also assume that liminf n W n (0) > 27. Then the 
importance sampling estimator p n is asymptotically optimal, that is, 

lim log[2nd moment of p n ] =27, 

n n 

provided that e n — > 0, e n — > and ne n — > 00. 

Remark 4.3. One can also write down a result similar to Theorem 
3.6 for the case where W£ = Wk and e n = e, e n = e. The corresponding 
importance sampling estimator, still denoted by p n , will satisfy 

liminf log [2nd moment of p n ] > W(0) — (Ke + Ce), 

n n 

where C is a constant only depends on the system parameter G, under the 
condition that e is small enough. 
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r*, 


©*(r fc ) 




ri 


= 21og(M2/A)(-l,-l) 


(M2,Mi, A) 


Ml > M2 




= 21og( M i/A)(-l,0) 


(Mi, A, ^ 2 ) 






= (0,0) 


(A,mi-M2) 




ri 


= (-21og(Mi/A),-21og(M2/A)) 


(Mi,M2,A) 


Mi < M2 




= 21og(Mi/A)(-l,0) 


(Mii A, H2) 




r3 


= (0,0) 





4.3. Examples and numerical results. In this section we study two ex- 
amples: the individual buffer overflow for two-node tandem Jackson network 
and total population overflow for d-node tandem Jackson network. 

4.3.1. Two-node tandem networks with individual buffer overflow. Con- 
sider the two-node tandem queueing (d = 2) networks with G = (A, fii,^), 
and the quantity of interest is 

p n = {size of queue 1 exceeds B\n or size of queue 2 exceeds B2n 

before the system returns to empty state, starting from 0}. 

One can think of B^n as the individual buffer size for node i. In the nota- 
tion we just introduced, it amounts to T = {x £ :x\ > B\ or xi > B2}. 
Assuming A + fj,\ + fi2 = 1 and A < min{//i, ^2}, we have (following a similar 
argument in [10]) 

7 = lim logp n = min Bi log — . 

n n i=l,2 A 

Consider piecewise afhne subsolutions that take the form W n = Wi A 
W% A W£ where 

W£(x) = (r k ,x) + 2 1 -kS n , 

for some small positive constants 5 n . The choice of and its correspond- 
ing change of measure {0*(rfc)} are given in Table 3. See also Figures 5 and 
6. 

It is not difficult to check that the conditions of Theorem 4.2 are satisfied 
with 

e n = 21og[(/ii V/x 2 )/A] exp{-5 n /e n }. 

It follows that the corresponding importance sampling estimator is asymp- 
totically optimal if 5 n — > 0, e n /5 n — > and ne n — ► 00. 
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4.3.2. d-node tandem networks with total population overflow. Consider 
the total population overflow for a d-node tandem Jackson network with 
d>2, that is, T = {x £ Rj : x\ + + • • • + x& > 1} and 

p n = Pjnetwork total population reaches n before returning to 0, 

starting from 0}. 

Specializing to the case d = 2 (and assuming fii > H2), the results stated 
in this section coincide with those of Section 3. Let p, = /xi A fj,2 A • • ■ A fj,d- 
Assuming A < p and A + H\ + h [id = 1, we have [10] 

7 = lim -- logp n = log 
n n A 

For any fixed n, we consider piecewise afHne subsolutions of form W n = 
W'i A • • • A W2 +1 where 

W^(x) = {r k ,x)+2 1 -k6 n 





Pi > P2 fi\ < J*3 

Fig. 5. The choice of {r^}. 





Fig. 6. Piecewise affine function. 
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for some small positive constant 5 n and 

r , ^ f -27, ifl<i<d+l-jfe, 
^ feJi ~ \ 0, otherwise, 

for 1 < k < d and rd+i = 0. The change of measure corresponding to r& is 



6*(r fc ) 



i - (nd+i-k - aO- 



X I A*, Ml? ■ • • j/^cf-fc) : Hd+2-k, ■ ■ ■ ,L>>d 

for 1 < k < d, and 

@*(r d+1 ) = 6 = (A,/ii,...,/i d ). 
We have the following lemma, whose proof is deferred to Appendix C. 

Lemma 4.4. The following properties hold: 

1. H(rfc) > /or every k, 

2. W™(x) < for all x G i\ 

3. if x & di is such that DW n (x) is well defined then (DW n (x),di) > 0, 

4. ifW 6n ' n denotes the exponential weighting ofW n with e n as the mollifi- 
cation parameter, then 

(DW en ' n (x),di) > -i n = -2 7 ex P {-<5 re /e n } 

for every x £ di . 

Invoking Theorem 4.2, the importance sampling schemes corresponding 
to W £n ' n are asymptotically optimal if 5 n — ► 0, e n /5 n — > and ne n — > oo. 

4.3.3. Numerical results. For all the simulations in this section, we set 
5 = — eloge. The justification for this choice is based on an argument anal- 
ogous to that of Remark 3.7. 

Consider the example of a two-node tandem queue with individual buffer 
overflows as presented in Section 4.3.1. For the case of //i > /12, we set A = 

Table 4 

IS based on subsolutions, two-node tandem, individual buffer overflow, /ii > [i-i 





n = 20 


n = 30 


n = 40 


Theoretical value 


4.81 x KT 12 


3.97 x 10~ 18 


3.47 x 10~ 24 


Estimate 


4.83 x KT 12 


4.04 x 10~ 18 


3.64 x 10~ 24 


Std. Err. 


0.20 x 10~ 12 


0.15 x 10" 1S 


0.18 x 10~ 24 


95% C.I. 


[4.43,5.23] x 10~ 12 


[3.74,4.34] x 10" 18 


[3.28,4.00] x 10" 24 
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Table 5 

IS based on subsolutions, two-node tandem, individual buffer overflow, /ii < fi-z 





n = 20 


n = 30 


n = 40 


Theoretical value 


1.44 x 10 -12 


4.82 x 10 -19 


1.61 x 10" 25 


Estimate 


1.40 x KT 12 


5.01 x 10~ 19 


1.85 x 10~ 25 


Std. Err. 


0.05 x 10~ 12 


0.29 x 10 -19 


0.21 x 10~ 25 


95% C.I. 


[1.30,1.50] x 10~ 12 


[4.43,5.59] x lO" 19 


[1.43,2.27] x 10~ 25 



0.1, fj,i = 0.5, fi2 = 0.4, and B\ = 0.9, B2 = 1. Simulations are generated for 
n = 20,30,40 with e = 0.01. Each estimate consists of 20,000 replications 
(see Table 4). Again, for comparison the theoretical value is obtained using 
an iterative algorithm. For the case of [i\ < ^2, we set A = 0.05, fi\ = 0.35, 
fi2 = 0.6, and B\ = 1, B2 = 0.6. We run simulations for n = 20,30,40 with 
e = 0.1, and each estimate consists of 20,000 replications (see Table 5). 

As for the total population overflow for general d-node tandem networks 
in Section 4.3.2, we run simulations for d = 4 and d = 9. 

For d = 4, we set A = 0.04, [i\ = ■ ■ ■ = jjt^ = 0.24, and run simulations for 
n = 20,25,30 with e = 0.1. Again, each estimate consists of 20,000 replica- 
tions, and the theoretical value is obtained using an iterative algorithm (see 
Table 6). 

For d = 9, we set A = 0.01, /xi = • ■ • = fig = 0.11, and run simulations for 
n = 20,25,30 with e = 0.12. Each estimate consists of 100,000 replications 
(see Table 7). In this benchmark value is obtained using the same 

dynamic importance sampling algorithm but with 10 million replications 
(the iterative algorithm for computing the theoretical value in the case of 
d = 4 does not work here because the state space is too large). 

5. Remarks on general queueing networks. The subsolution approach to 
importance sampling can be extended to general Jackson networks and net- 
works with more general (e.g., Markov modulated) arrival/service processes. 
For example, a theoretical result analogous to Theorem 4.2 that applies to 
general open Jackson networks appears in [15]. Such extensions, even though 



Table 6 

IS based on subsolutions, four-node tandem, total population overflow 





n = 20 


n = 25 


n = 30 


Theoretical value 


2.04 x 10 -12 


5.02 x 10~ 16 


1.10 x 10~ 19 


Estimate 


2.05 x 10~ 12 


5.07 x lO" 16 


1.08 x 10 -19 


Std. Err. 


0.04 x 10~ 12 


0.07 x 10" 16 


0.03 x 10" 19 


95% C.I. 


[1.97,2.13] x 10~ 12 


[4.93,5.21] x 10~ 16 


[1.02,1.14] x 10" 19 
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Table 7 

IS based on subsolutions, nine-node tandem, total population overflow 

n = 20 n = 25 n = 30 

Benchmark value 3.18 x KT 14 9.41 x KT 19 2.16 x 1(T 23 

Estimate 2.93 x KT 14 10.80 x 10~ 19 1.98 x 10~ 23 

Std. Err. 0.23 x 10~ 14 1.30 x 10~ 19 0.30 x 10~ 23 

95% C.I. [2.47,3.39] x 10~ 14 [8.20,13.10] x 10~ 19 [1.38,2.58] x 10~ 



routine to some degree, have a few distinctions. One is that Neumann-type 
boundary conditions, which were adequate for tandem networks, are not 
sufficient anymore in general, and the more elaborate boundary Hamiltoni- 
ans have to be considered instead. Another distinction is that the geometric 
properties of the interior and boundary Hamiltonians are much less transpar- 
ent. For instance, the Markov modulated case requires solving an eigenvalue 
problem to obtain the Hamiltonian. Consequently, explicit formulas for the 
gradients of the needed affine pieces are no longer available, and must be 
computed numerically [9]. 

In order to illustrate some of the ideas of these generalizations, we con- 
sider the following two-node Jackson network with feedback. Again assume 
Poisson arrivals with rate A and consecutive exponentially services with rate 
Hi at node i. However, after being served at node 2, a job has probability f3 
to be returned to node 1 (see Figure 7). Note that the full two node model 
that includes self-feedbacks and multiple arrival streams can be treated in a 
completely analogous fashion, albeit with more involved computations. 

Suppose that the quantity of interest is the probability of total population 
overflow, 

p n = Pjnetwork total population reaches n before returning to 0, 

starting from 0}. 

Let ji = fi\ A Assuming the stability condition A < fl(l — (3), and without 
loss of generality, A + A*i + A*2 = 1, we have [10] 

1 

7 = hm logp n = log . 

n n A 




Fig. 7. Two-node network with feedback. 
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Fig. 8. 5toe dynamics. 



5.1. System dynamics. Let Z = be the embedded discrete time 
Markov chain that represents the queue lengths at the transition epochs of 
the network. Then the dynamics of Z can be modeled by 

Z(k + 1) = Z(k) + n[Z(k),Y(k + 1)], 

where {Y(k)} are i.i.d. random variables taking values in 

v = {^o = (i,o),m = (-1, i),v 2 = (o,-i), v 3 = (i,-i)}, 

and the mapping tt is defined as 

{0, if z\ = and y = v\, 
0, if z 2 = and y = t; 2 or u 3 , 
y, otherwise. 

Under the original probability measure P, the distribution of Y(k) is just 

9 = (A, w ,(l-% 2 ,fe)eP + (V). 
See Figure 8 for an illustration of the boundary dynamics. 

5.2. The Isaacs equation and boundary Hamiltonian. Following the argu- 
ment in Section 3.5, one can write down the Isaacs equation M(DW(x)) = 
for x £ D, where 

M(p) = sup inf 

e eP +( V) 0eP+(v) 

with 

3 

¥(6) = J20[vi] ■ v u 

i=0 



(p,F(0)) + £%]io g ^ + i?(0||e) 
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and the Dirichlet boundary condition W(x) = for x € d e . 

However, as far as the boundaries d\ and <9 2 are concerned, the Neumann- 
type boundary condition {DW(x), di) = is not sufficient (more precisely, 
it is not sufficient for 9 2 , since the direction of constraint is not well de- 
fined on d 2 ). Instead one has to resort to a boundary Hamiltonian [6], and 
consequently, the boundary conditions become 

U 9 .(DW(x))=0 for x E d h i = 1, 2, 

where the boundary Hamiltonian Mg. is defined exactly as EI except ¥(6) is 
replaced by ¥{(6) with 

Fi(0) = E0H-^ F 2 (0)= ]T 6[vi] ■ Vi. 

Remark 5.1. Proposition 4. 1 can be easily applied to the interior Hamil- 
tonian EI and the boundary Hamiltonian M.g t to show the existence of saddle 
points and the concavity of these Hamiltonians. The formulae for the sad- 
dle points follows. Let (6*(-), #*(•)) be the saddle point for EI, and 
(6£.(-),0£.(-)) be the saddle point for Ha,. Then 

Q*(p) = 0*(p) = N{p) ■ (Ae- pi/2 ,^e (pi " P2)/2 , 

(l-/% 2 e^/W P2 ~ Pl)/2 ), 
©a» = 9l (p) = mip) ■ (Ae^/ 2 ,^, (1 - /% 2 e^ 2 ,/W P2 ~ Pl)/2 ), 
®UP) = °UP) = N *(P) ■ (\e-^l\^-^l\ (1 - (3)^,^2), 

where N(p),Ni(p) are normalizing constants so that all these vectors are 
probability vectors [i.e., elements in V + (V)]. Moreover, EI(p) = 2log N(p) 
and M 9i (p) = 2log Ni{p). 

5.3. Piecewise affine subsolutions and mollification. The definition of a 
classical subsolution is the same as Definition 3.3, except that Neumann 
boundary inequality (DW(x), di) > is replaced by Mg i (DW(x)) > for 
x £ di, i = l,2. 

The construction of a piecewise affine subsolution is very similar to that 
in Section 3.8.1 (see Figures 9 and 10). Define 

n = 2 7 (-l,-l), r2 = 2 7 (-l,0) + 2( 7 -a)(0,-l), r 3 = (0,0), 

where a € (0,7] is given by 

a ^ f !og[^i/(Mi + A - (1 - /% 2 )], if Mi > M2, 

\ l og[m/(x + Pni)], if/xi</i 2 . 
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Pi > Mi < 1*2 

Fig. 9. The Hamiltonians and the choice of {rk}- 

Let W 5 = Wf A Wi A where 

Wf(x) = (n,x)+2^-6, 
Wi(x) = {r 2 , x) +27-2(5, 

0«0 = fa, x) + 2 7 - (1 + 27/o)<5. 

The exponential weighting of with parameter e yields a smooth function 

W e ' 5 (x) = -elog^T exp(--W£(x)X 
k=i £ > 

*2 




5 

Fig. 10. The piecewise affine subsolution. 
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that satisfies 

We have the following result, whose proof is deferred to Appendix C. 

Lemma 5.2. For each k we have H(rfc) > 0, and the function W £ ' S sat- 
isfies: 

1. M(DW £ ' 5 (x)) >0forxeD, 

2. W £ ' 5 (x) < for x G <9 e , 

3. /or eac/i i = 1, 2, and x G <9j, 

3 

H^W^x)) > £ ^ 5 (x)M 9i (r fc ) > -Cexp{-,5/e} 

k=l 

for some constant C that only depends on the system parameter 0. 

5.4. The importance sampling scheme and its asymptotics. Define the 
scaled state process X n (k) = Z(k)/n. Dynamic importance sampling schemes 
are characterized by stochastic kernels n [-|-] on V such that Y(k + 1), con- 
ditional on {Y(l), . . .,Y(k)}, has distribution e n [\X n (k)] G V+(V). 

The importance sampling scheme based on W £ ' 5 is as follows. Define the 
stochastic kernel £,<5 [-|-] on V by 

e £ ' S [-\x) = Y,Pl' S ^)&*(r k ) ifxeD 
k=l 

and 

e e ' 5 [-N = ^^(x)9^(r fc ) ifxGd,. 
k=l 

Here the formulae for 0* and Qq, can be found in Remark 5.1. We will allow 
e and 5 to be n-dependent, denoted by e n ,5 n , and let n [-|-] = £n ' 5 "[-|-]. 

Denote by p n the corresponding importance sampling estimator. We have 
the following result, whose proof is very similar to that of Theorem 3.8. In- 
deed, in the proof of Theorem 3.8, the Neumann boundary condition is used 
to derive (implicitly) certain inequalities associated with boundary Hamil- 
tonians. Such inequalities can now be obtained using Lemma 5.2. We omit 
the details. 

Theorem 5.3. The importance sampling estimator p n is asymptotically 
optimal if 5 n — ► 0, e n /6 n — > and ne n — > oo. 
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Table 8 

IS based on subsolutions, two-node tandem with feedback, fii > /12 





n = 20 


n = 30 


n = 40 


Theoretical value 


9.60 x 10" 11 


2.66 x 10 -16 


7.27 x 10 -22 


Estimate 


9.31 x KT 11 


2.60 x 10~ 16 


7.33 x 10~ 22 


Std. Err. 


0.17 x 10~ n 


0.07 x 10" 16 


0.33 x 10~ 22 


95% C.I. 


[8.97,9.65] x lO" 11 


[2.46,2.74] x 10~ ie 


[6.67,7.99] x 10~ 22 



One can also use a fixed pair of parameters e and 8 for all n, which leads 
to a result similar to Theorem 3.6 and suggests a good choice may be to 
take S n = -e n loge n . 

5.5. Numerical results. For all the simulations in this section, we set 
e = 0.02 and 5 = — eloge. For the case of fj,\ > fi2, we choose A = 0.1, /xi = 
0.5, \i2 = 0.4, and j3 = 0.1. Each estimate consists of 20,000 replications (see 
Table 8). The theoretical value is obtained using a numerical iterative algo- 
rithm. 

For the case of fi\ < fiz, we choose A = 0.1, //1 = 0.43, fj,2 = 0.47, and 
= 0.2. Each estimate consists of 20,000 replications (see Table 9). 

APPENDIX A: A LARGE DEVIATION RESULT 

In this appendix we prove a large deviation result that may be of some 
independent interest. Recall the definition of process Z by (3.2): 

Z(k + 1) = Z(k) + n[Z(k),Y(k + 1)], 

where {Y(k)} is a sequence of i.i.d. random variables taking values in V = 
{vq,vi,V2} with distribution O = (A,/ii,/i2)- Define the hitting times 

o n = inf{/c >0:Z 1 (k) + Z 2 {k) = n}, 

do = inf{/c > : Zi(k) + Z 2 (k) = 0}. 

We also let Z„ = {{z\,z 2 ) G : z\ + Z2 < n}. 



Table 9 

IS based on subsolutions, two-node tandem with feedback, fii < /12 





n = 20 


n = 30 


n = 40 


Theoretical value 


4.39 x 10~ 1(J 


2.13 x 10 -15 


9.60 x 10" 21 


Estimate 


4.62 x 10" 10 


1.91 x 10" 15 


9.88 x 10~ 21 


Std. Err. 


0.46 x 10" 10 


0.13 x 10" 15 


0.87 x 10~ 21 


95% C.I. 


[3.70,5.54] x 10" 10 


[1.65,2.17] x 10" 15 


[8.14,11.64] x 10" 21 
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Q(k) 



Proposition A.l. There exists a constant c > 0, which only depends 
on the system parameter (X,m,/j,2), such that 

limsup sup ilog£' 2 [e c(CTnAf7o) ] < oo. 
Here E z denotes expectation conditioned on Z(0) = z. 

The main difficulty in proving such a result is that the definition of a"o 
requires that the state process hit a single point, and that it is not suffi- 
cient to consider instead a small neighborhood of this point. The key idea 
to overcome this is to study a closely related one-dimensional process. Let 
S(z) = £^[00] for every z G Z+. S is finite, thanks to the stability assump- 
tion. Define the process 

(S(Z(k)), iffc<a , 
\ do — k, if k > (To- 

In other words, the process Q is random until the process Z hits the origin, 
after which Q becomes deterministic and decreases by 1 each step. The 
scaled continuous-time piecewise affine interpolation process is just 

Qn(t) = -Q([nt\) + nt ~ LntJ [Q([nt\ + 1) - Q([nt\)], 
n n 

for t > 0. 

In order to give a large deviation upper bound for the processes {Q n }, we 
need the following definitions. Fix a€K, For each z € define 

(A.l) h(z; a) = log£ z exp{a(Q(l) - Q(0))}, 

(A. 2) H(a) = sup h(z;a). 

Clearly, H is convex since h(z; •) is convex for each z. The convex conjugate 
of H is denoted by L, or, 

(A.3) L(fl) = sup[a(3-H(a)}. 

The function L is nonnegative since H (0) = 0, and it will serve as a local 
upper rate function. For any fixed time T > 0, let C([0,T];M) be the Polish 
space of continuous functions on interval [0, T] equipped with the supremum 
metric p. Define a mapping It : C([0, T]; R) — > R + U {oo} by 

rT 

L(cj)(t))dt, if is absolutely continuous, 
otherwise, 
and denote its level set by 

= {</>£ C([0, T]; R) : 0(0) = x, 7 T (0) < s} 



It{4>) 
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for every i£l and s > 0. 

We have the following results, whose proofs are deferred to the end of this 
appendix. Proposition A.l is a consequence of these two lemmas. 

Lemma A. 2. There exists a constant M > such that S(z) < M{z\+Z2) 
for every z € and the absolute value of all increments of {Q(k)} are 
uniformly bounded by M . 

Lemma A. 3. Let T> be given. 

1. ir(0) > for every 4>, and It(4>) = if and only if <f>(t) = —1 for a.e. 

te[0,T}. ' 

2. There exists a constant K such that ir(0) is finite only if (j) is Lipschitz 
continuous with Lipschitz constant K . 

3. Given any compact set FcR, the union of level sets, [j xeF ^ x (s), is 
compact for any s > 0. In particular, It is lower semicontinuous. 

4. For any h > and s > 0, we have 

limsup sup -logF z {p(Q n ,$ S r z)/n (s)) > h} < -s. 

n z&L n n 

Proof of Proposition A.l. Let M be the constant in Lemma A. 2, 
and K be the Lipschitz constant in Lemma A. 3. For any 5 > and T > 0, 
define 

F| = {<£eC([0,T];]R):^(0) e [0,M],—6 <4><M + 5, 

4> is absolutely continuous, \<j)\ < K V M}, 

which is a compact subset of C([0, T];R). It is not difficult to see that It(4>) > 
for any (f>E Fj, if T > M + 5. Indeed, suppose irW>) = 0. Then by Lemma 
A.3 we have <p(t) = 0(0) - t. If (p(0) G [0, M] then for any M + 5 < t < T, 
0(t) = <j)(0) - t<-5. Thus ^ i F T . It follows that, as long as T > M + S, 
min{/j'((/)) : cj) € i 7 ^} > 0, thanks to the lower semicontinuity of It and the 
compactness of F T . 

Now fix an arbitrary 5 (the specific value of 5 is not important), and let 
to = M + 45. Define 

S = ±min{I^):<£eif/}>0. 

For any x and € <& x (s), by Lemma A.3 again, <f> is Lipschitz continuous 
with |0| < A'. However, ^(s) H -F^ 5 = by definition of s. Therefore, for 
any x € [0, M] and € &x(s), we must have 

(A.4) inf {t > : 0(t) £ [-25, M + 25]} < t . 

Define the stopping time 

T S n = \ni{t>0:Q n {t) i [S,M + S\}. 
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Thanks to Lemma A. 2, Q n has Lipschitz continuous sample paths with 
\Qn\ < M. Moreover, for any initial condition Z(0) = z £ Z n , Lemma A. 2 
implies Q n (0) = S(z)/n G [0,M]. It follows that 

F z (4>t )=F z (Q n EFl). 

Thanks to equation (A. 4), for every Q n € F^, we have p(Q n , &s(z)/n( s )) > 
Therefore, 

P*(r* > to) < F z (p(Qn, *$(,)/«(*)) > 5). 

However, it follows from Lemma A. 2 that {a n A o"o > nto} C {r^ > to} for 
n> M /5. As a consequence, 

limsup sup — \ogf z {a n A o"o > nto) 

n zez„ n 

< limsup sup — logP 2 (-r^ > to) 

< limsup sup - logF z (p(Q n ,$ s , z y n (s)) > 5) 

n z&L n n 

<s, 

here the last inequality is by Lemma A. 3. In particular, 
sup ¥ z (a n A cj > [nt \ + 1) 

< sup F z (a n A cr > nt ) < e~ ns/2 
z<=Z„ 

for n big enough. Let k n = \nto\ + 1- Thanks to the Markov property, for 
all sufficiently large n and all j > 

sup ¥ z (a n Aa >jk n )<e- jns ^. 

Let c be any constant such that < c < s/(4to). We have, for n big enough, 
ck n < ns/4, which implies that 

oo (j+l)fc„-l 

= £ E e " P *K A a = i) 

j=0 i=jk n 
oo 

< e <*„ £ e c ^p,(jfc n < a„ A a < (j + l)k n - 1) 

j=0 



< e ckn e~^ ns ^ 2 ~ ck " 1 

3=0 
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oo 



< e ckn e~ jns/i 

rk 1 

— g cfi,ri 

\ _ g-ns/4 ' 



Therefore, 



1 ck 1 1 

lim sup sup - log £ z fe c(<T " A<T ° } 1 < lim — + lim - log n = ct . 

n z( zz n n n n n n 1 - e ~ ns ' A 

This completes the proof. □ 

It remains to show Lemmas A. 2 and A. 3. We begin with the following 
result, whose proof is a straightforward consequence of the definition of 
Q(k) and thus omitted. 

Lemma A. 4. Let F k = a(Z(0),Y(l), . . . ,Y(k)). Then 
E z [Q(k + l)-Q(k)\T k ] = -l 
for every z G and every k>0. 

The next lemma is concerned with the monotonicity of the sample path 
with respect to the initial conditions. To be more precise, for Zj_, we 

say z < z if the inequality holds component wise. Also for z £ denote by 
Z z the sample path corresponding to initial condition z, that is, 

Z 2 (0) = z, Z z (k + 1) = Z z (k) + ir[Z z (k), Y(k + 1)]. 



Lemma A. 5. Define g : T?\ — > Z + by g(z) = z\ + Z2- Given any z,z € Z?_ 
such that z < z, 

Z z (k) < Z z {k), 
g(Z z (k))-g(Z- z (k))<g(z)-g(z) 

for every k>0. 

Proof. We use induction. The claim is trivial for k = 0. Assume for 
now that it holds for some k > 0. Introduce the following notation: 

T = {z£Z 2 + :zi >0,z 2 >0}, 

T 1 = {zeZ 2 + :z 1 = 0,z 2 >0}, 

T 2 = {zeZ 2 + :z l >0,Z2 = 0}. 
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We consider the following possible scenarios separately: (i) Z z (k) G T; 
(ii) Z~ z {k) G T i; (iii) Z z (k) G T 2 ; (iv) Z z (k) = 0. Since the proofs for these 
cases are essentially the same, we choose to only present case (ii). Assume 
that Z z (k) G Ti. Thanks to the induction hypothesis Z z (k) < Z z (k), we 
must have Z z (k) G Y\ U T. If Z z (k) G Ti, or Z z (k) G T but Y(k + 1) ^ v Xl 
themr[Z 2 (k),Y(k + l)] = Tr[Z z (k),Y(k+l)] and the claim holds for k+ 1. For 
the case where Z z {k) G T and y(A; + 1) = v±, we have Z z (k + 1) = Z z (k) and 
Z 2 (A; + 1) = + = Z z (k) + (-1, 1). Since Z{(k) > and Zf(fc) = 0, it 

follows that Z~ z (k + 1) < + 1). Also note that g(Z z (k + 1)) = g(Z z (k)), 
g{Z z (k + 1)) = g(Z z (k)). This completes the proof. □ 

Proof of Lemma A. 2. Let M = 2S'((1,0)) + 25((0, 1)). We would like 
to show that for any z G and any i = 0, 1, 2, 

(A.5) |5(z + vr[z,^])-5(z)j < M. 

We can assume that 7r[2;,Uj] = V{, since otherwise there is nothing to prove. 
First we consider the case i = 2. Let z = z -\- V2 = (z±,Z2 — 1) < z. Define 
stopping times T z = inf{/c > : Z z (k) = 0} and T z = inf{/c > : Z z (k) = 0}. 
Thanks to Lemma A.5, we have Z z (k) < Z z (k) for any k > 0, which implies 
rpz < j, z _ gy ^.^g game i emma5 g(Z z (k)) — g(Z z (k)) < g(z) — g(z) = 1 for every 
fc. In particular, for /c = T 5 , it yields c/(Z z (T s )) < 1. Therefore Z 2 (T 2 ) G 
{(0, 0), (1, 0), (0, 1)}. Now the strong Markov property yields 

S(z) = S(z) + ¥{Z Z (T Z ) = (1, 0)}S((1, 0)) + ¥{Z Z (T Z ) = (0, 1)}5((0, 1)). 

Thus \S(z) - S(z)\ < 5((1,0)) + 5((0,1)) < M/2. The proof for the case 
i = is almost verbatim. For i = 1, z + Uj = 2i + (—1, 1). One can use the 
same argument to prove that \S(z) — S(z + (— 1,0))| < M/2 and \S(z + 
(—1,0)) — S(z + (— 1,1))| < M/2, and then use the triangle inequality to 
show \S(z + vi) - S(z)\ < M. We omit the details. 

It follows from (A.5) that the increment of {Q(k)} is uniformly bounded 
by M [note that M > 1 trivially since S(z) > 1 for every Now for 

every z G we can write S(z) as 

S(z) = [S(z) - 5((0, z x + z 2 ))\ + [5((0, «i + z 2 )) - S((0, 0))] 
= ^[5(z + ^!)-5(z + (i + l) Wl )] 

i=0 

+ ^2 [S{z + zivi - iv 2 ) - S(z + zivi - (i + l)v 2 )]- 

i=0 

Thanks to (A.5) again, the absolute value of each summand is bounded by 
M. Thus S(z) < M(2zi + z 2 ). Taking M = 2M completes the proof. □ 
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Proof of Lemma A. 3. Clearly h(z; •) is convex for every z, H is con- 
vex, and H(0) = 0. Let M be the uniform bound on the increments of 
{Q(k)} given by Lemma A. 2. It follows that |/i(z;a)| < M|a|. Therefore 
|ff(a)| < M\a\ for every a, whence H is Lipschitz continuous (thanks to its 
convexity) . 

We claim that H is differentiable at a = and H'(0) = —1. Indeed, since 
h(z;a) is differentiable with respect to a and h(z;0) =0, we have 

H(a) h(z;a) 

= sup = sup D a n{z; a[z\ j, 

where a[z] is some number between and a. But Lemma A. 4 implies that 
D a h(z;0) = E Z [Q(1) — Q(0)] = —1, while Lemma A. 2 and simple algebra 
yield that \D aa h(z;a)\ < K for some constant K and for every z and a. 
Therefore, 



H(a) 



<K\a\ 



Letting a — > 0, it follows that H is differentiable at a = with H'(0) = — 1. 

The convexity of H and H (0) = imply that L, defined by (A. 3), is convex 
and nonnegative. The Lipschitz continuity of H implies that L takes value 
infinity outside a compact set. Lastly, H'(0) = — 1 imply that L{j3) = if and 
only if P = — 1. Parts 1 and 2 of Lemma A. 3 follow from these properties of 
L. The rest of the lemma follows from Theorem 4.1 of [5] and that 

L(/3) < l{z; /3) = sup[a/3 - h(z; a)]. 

a 

This completes the proof. □ 

APPENDIX B: PROOF OF MAIN THEOREMS 

We put the proofs of Theorem 3.6 and Theorem 3.8 together in this ap- 
pendix. These proofs are, in essence, verification type arguments. 

Lemma B.l. The function W e ' S as defined in (3.13) satisfies the follow- 
ing properties: 

1. M(DW e ' 5 (x)) > for all x € D. 

2. W £ ' 5 (x)<0 for allxed e . 

3. (DW £ ' S (x), di) > — 27exp{— 5/e} for every x <G d{. 

4. There exists a constant C which only depends on the system parameter 
(\, Hi, H2) , such that 

d 2 W e '\x) 



dxi dxj 



£ 



for every x £ D and every i,j. 
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Proof. Thanks to (3.14), the concavity of EI (Proposition 3.2), and that 
H(r fc ) > 0, it follows that 

M(DW e ' 5 (x)) = M Pf(x)r k ) > £ Pk\ x M r k) > 0. 
\k=l / fe=i 

By Lemma 3.5 we have W e >\x) < W s (x). But W s (x) < for x G d e by 
definition, and so the second claim follows. 

Since (ri,di) = (r 3 ,di) = and (r 2 ,di) = -27, we have (DW £ ' 5 (x),d\) = 
-2-/p £ 2 '°(x). For x G #i, thanks to (3.15) and (3.12), we have 

e,5 ( v ^ exp{-V^(a)/e} 
exp{— Wg (xj/e} 

Once can treat x G 82 in an analogous fashion. This completes part 3. 

Denote by ej the standard ith unit vector. It follows easily from (3.14) 
and (3.15) that 

d^ s (x) _" dpf(x) 

dxidxj dxj ' 



flp *' (x) -i.p£'(s) 



m=l 

e,S, 



dxj e 

The last claim follows readily since p E jf{x) is bounded between and 1. □ 



We now define a few functions that are closely related to the interior and 
boundary Hamiltonians. For each a > and 0, 9 G V + (V), let 



Z(a,p; 6, (?) = (! + a)(p, ¥(0)) + (1 + 2a) £ %] log ^ + i2(0||6) 



i=0 



Similarly, for each j = 1, 2, let Fj(#) = ®[ v i\ ' u i an d 

4(^9,0) = (1 + a)(p,Fj(0)) + (1 + 2a)5>M log -^j + fl(0||0). 

i=o "l^J 

Lemma B.2. Let peR 2 suc/i tfiai H(p) > 0. Then for any a > 0, 

inf I(a,p;e*(p),fl) > 0, 

where Q*(p) is as defined in Proposition 3.2. 



36 P. DUPUIS, A. D. SEZER AND H. WANG 

Proof. By definition of L, (3.7), and Proposition 3.2, it follows that 
L(a,p; 6* (p) ,6) = L(0,p;&*(p),0) + 2a log N(p) 
= L(0,p;e*(p),8) + aU(p). 
However, thanks to Proposition 3.2 again, we have 

M L(0,p; Q*(p),0) = L(0,p; O* (p) , 9* (p)) = M(p) . 

This completes the proof. □ 

Proof of Theorem 3.6. To ease exposition, we adopt the notation 
W = W e,s , pk = p 6 ^ , and set e = 27exp{— 5/e}. Fix any a > 0. We claim 
that 

(B.l) inf L(a,DW(x);e n [-\x},e) > 0. 

6»GP+(V) 

Indeed, thanks to the definition of L, the concavity of the logarithmic func- 
tion, and that DW(x) = J2 Pk(%) r k: @ n [-\x] =J2Pk(x)®*( r k), we have 

2 

L(a, DW(x);@ n [-\x},9) > £ p k (x)L(a, r k ; 9*(r k ),6). 

Inequality (B.l) follows readily since M(r k ) > and Lemma B.2. Note that 
for every x 6 dj n D n , thanks to (B.l), 

Lj(a,DW{x);@ n [-\x],9) 

= L(a,DW{x);& n [-\x],9) - (l + a)9[vj] ■ (DW(x),Vj) 

>-{l + a)0[ Vj ]-(DW{x),Vj). 
Recalling that dj = —Vj, by Lemma B.l we arrive at 
(B.2) inf LAa,DW(x)-Q n \-\x],e) > -(l + a)e. 



We now show that inequalities (B.l) and (B.2) imply that for every x £ D Ti 

2 

w(x+-ir(x,Vi)) -W(x) \ ■ 0[vi] 



>-(l + a) 



i=0 

C 

— + e 

ne 



inf ^V(l + a)n 
(B.3) + (1 + 2a) £ %] log + ij^ne) 
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where C is a constant that only depends on the system parameter (A, /ii, /X2) 
To this end, consider separately the cases x 6 D n (interior) and x E dj (1 D T: 
(boundary). For x 6 D n , ir(x,Vi) = Vi. Therefore, by Taylor's expansion, 



n 



W\x + ^v^j -W{x) 



= (DW(x), Vl ) ■ 6[ Vi ] + —{vuD^ix^Vi) ■ 0[vi], 

In 

where Xi is some point on the line connecting x and x + vi. Thanks to Lemma 
B.l, the definition of F [see (3.7)], and that ||fi|| 2 < 2, we have 



E 

i=0 



11 



6[vi]>(DW(x),W(e))-—. 

ne 



This and (B.l) immediately lead to (B.3). The case of x € dj C\D n is similar, 
except now that n(x, Vi) = if i ^ j and tt(x, Vj) = 0. We omit the details. 

Applying the relative entropy representation (Remark 3.1) to the left- 
hand side of (B.3) and adopting the notation (3 n = C /{ne) + e, we have 



(1+q)/3„ . ~(l+a)n[W{x+TT(x,Vi)/n)-W{x)} 



i=0 



Q n [vi\x] 



l+2a 



•6N<1 



for every x € D n . Recalling the definition of X n in (3.4), this display implies 
that the process M = {M(k) : k > 0}, where 



M(k) = e 



-(l+a)l3 n k e -(l+a)nW(X"(k)) 



fk-l 

n 

^=0 



e[Y(j + i)\ 



l+2a 



e n \Y(j + i)\x n (j)] 



is a super martingale under the original probability measure P. Thanks to 
the optional sampling theorem and the nonnegativity of M, 

E r M(T n A T ) < £ P M(0) = e -(i+«)»W(o) . 
Since p n =Pn- 1{t„<t } and W(x) < for x € <9 e , 
M(T„AT )>M(T n )-l {Tri<To} 

_ -(l+a)/9nT„„-(H-a)nW(X™(Tn))^l+2o 



> ~(l+a)/3„T„ -l+2a 
— c 



It follows that 



£; Pj e _(l +Q ) / 3 n r„^l+2aj < e -(l+a)nW(0)_ 
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By Holder's inequality, 
[2nd moment of p n ] 
= E ¥ \p n ] 

< e -((l+a)/(l+2a))nW(0) . £ JPr e ((l+a)/(2a))/3„ (T„AT )j2a/(l+2a) ^ 
which yields 

lim inf log [2nd moment of p n ] 

n 77, 

(B.4) 

> 1±^ W (0) - ^^li mS up-lo gj E p [e« 1+a )/( 2a )^( T " AT »)]. 
Let c be the constant in Proposition A.l, and let 

C = lim sup sup - log^[e c(T " ATo) ]. 

n x€D n n 

It follows immediately from Proposition A.l that C is finite. Note that (B.4) 
holds for any a > 0. In particular, it holds for a = e/c. With this choice of 
a, we have 

1 + a _ 1 + a C e c 
2a ~ 2a + 2 + 2' 
Therefore, if e < c, then for n big enough, 

1 + Q « 

Pn < C. 



2a 

It follows from (B.4) and W(0) < 27 that 

1 1 + a 2a - 

liminf log [2nd moment of p n ] > W(0) C 

n n 1 + 2a 1 + 2a 

= W(0)-e^- Y _[W(0)+2C] 

>W(0)-e-[27 + 2Cl. 
c 

It follows from Lemma 3.5 that 

(B.5) W(0) = PF M (0) > W*(0) - 3e = 2 7 - 35 - 3e. 

Recall that e = 27exp{— 5/e}. We conclude the proof by setting A = 27^7- 
2(7] /c, and to enforce e < c (which was assumed in the proof) we set £> = 
l/log(27/c) if c<27 and 5 = 00 if c > 27. □ 
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Proof of Theorem 3.8. It suffices to show that 

liminf log [2nd moment of p n ] > 27, 

since the other direction is automatic by Jensen's inequality (see Section 
2). We use the notation W n = W £n > Sn , pi = p £ k n ' Sn , and e n = exp{-5 n /e n }. 
The same argument leading to inequality (B.4) gives that, for any strictly 
positive sequence {a n }, 

lim inf log [2nd moment of p n ] 

n n 

1 -I- n 

> liminf n W n (0) 

~ n l + 2a n y 1 

_ l im sup 2a " I bg E ¥ [ e ((i+«»)/(2«n ))/3»(^AT )i 
n 1 + 2a n n 



where 



0n — £-n- 



In particular, we should choose a n so that 

1 + a n n 
-p n = c or a r , 



2a n 2c - P n 

Note that a n is strictly positive (at least for n big enough) and a n — > since 
P n — > by assumption. It follows that 

liminf log [2nd moment of p n ] > liminf W n (0). 

n n n 

However, by (B.5) W n (0) > 27 — 35 n — 3e n - This completes the proof. □ 
APPENDIX C: COLLECTION OF MISCELLANEOUS PROOFS 



Proof of Lemma 4.4. Clearly, M(r d+1 ) = M(0) = 0. For 1 < k < d, 
Proposition 4.1 implies that H(rfc) = 21ogA^(rfc) where 

1 _ X/Id+l—k 

——— =n + n 1 -\ h Hd-k H = h Pd+2-k H V Vd- 

N(r k ) 11 

In order to show H(rfc) > or N(rk) > 1, it suffices to show that 

/2 + Xpd+i-k/li < A + Pd+i-k, 

or equivalently, (pd+i-k — fi)(P- — A) > 0, which directly follows from the 
assumptions. Furthermore, for cc € T we have 

IU n (x) < (x) = -2 7 (xi + x 2 + • ■ • + z d ) + 2 7 - 5 < -5 < 0. 
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Now assume x G di for some 1 < % < d. Suppose DW n (x) is well denned, 
or equivalently, Wi(x) A • • • A = (x) for some unique k*. In this 

case, DW n {x) = r k *. In order to show (rfc*,dj) > 0, note that 



(CI) (r fc ,d 4 ) 



-27, iffc + i = d + l, 
0, otherwise. 



Thus it suffices to show that fc* ^ d + 1 — i. This is true since the definitions 
of {r^} and Xi = imply 

Wd +2 -i(^) = (r^a-i.z) + 7 - (d + 2 - i)5 n 
= (rd+i-i,x) +7 - (d + 2 - i)£ n 
= W d n +1 _ i (x)-<5 

It remains to show that (LW £n ' n (x), dj) > —2~fexp{—5 n /e n } for x G <9j. 
Since DW £n ' n (x) = J2 d k t\ p £ k n ' n (x)r k and for x&di, 

" exp{-^ +2 _ j( x)/.4 " eXPi 
the desired inequality follows from (C.l). □ 

Proof of Lemma 5.2. We will only present the proof for the case 
H\ < /J.2, and omit the analogous proof for p\ > //2- 

Assume p,\ < p2 hereafter, and use the notation W = W £ ' S and p k = p e k ' S ■ 
The formulae in Remark 5.1 yield 

H(n) = 21ogiV(r 1 ) = -21og[(l - p)m + m + PH2 + — 

L m 

By assumption A < (1 — (3)p\ and p\ < fJ-2, it follows that 

^-lV(l-/3)/ii-A)>0 or (i-/3) Ail + ^<A + (l-/3)/x 2 . 
Mi / Mi 

Since A + p\ + p,2 = lj we have H(n) > 0. Similarly, we have H(r2) = 
and H(r3) = 0. Thanks to the concavity of H, DW(x) = J2k Pk( x ) r k, and 
J2kPk( x ) — 1) Pk( x ) — 0) we have M.(DW(x)) > 0. As for x £ d e , we have 
W(x)< {r 1 ,x) + 2j-S = -S<0. 

It remains to show part 3. For x £ di, the concavity of H& implies that 



3 2 

M di (DW(x)) > ]Tp fc (x)Ha> fc ) = $> fc (x)H Si (r fe ). 

A=l fc=l 
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However, it is not difficult to check that Mq 1 (ti) > and Hg 2 (r2) = 0. 
Therefore, we only need to show p2(x) < exp{— 5/e} for x £ d\ and p\{x) < 
exp{— 5/e} for x £ 82- For x 6 82, we have X2 = and 

exp{-^(x)/ g } 

For x = (0,2:2) € <9i, we consider two cases: X2 < £2 and X2 > a?2 separately, 
where x\ = 5 /a. For X2 < x\, we have 

, . exp{-Wf (x)/e} f2(7-a) A 2 7 \ <n r ... 

Similarly, for x > we have 

, . exp{-W 2 a (x)/e} f 2a 5) r «. , , 

P2 * < I w ' (7 { = exp --X2 + - < exp{-o/e}. 

This completes the proof. □ 



REFERENCES 

[1] Asmussen, S. (1982). Conditioned limit theorems relating a random walk to its 

associates, with applications to risk reverse process and GI/G/1 queue. Adv. in 

Appl. Probab. 14 143-170. MR0644012 
[2] Asmussen, S. (2003). Applied Probability and Queues. Spring, New York. MR1978607 
[3] De Boer, P.J. (2006). Analysis of state-independent importance sampling measures 

for the two-node tandem queue. ACM Trans. Modeling Comp. Simulation 16 

225-250. 

[4] Dupuis, P. and Ellis, R. S. (1997). A Weak Convergence Approach to the Theory 
of Large Deviations. Wiley, New York. MR1431744 

[5] Dupuis, P., Ellis, R. S. and Weiss, A. (1991). Large deviations for Markov pro- 
cesses with discontinuous statistics, I: General upper bounds. Ann. Probab. 19 
1280-1297. MR1112416 

[6] Dupuis, P., Ishii, H. and Soner, H. M. (1990). A viscosity solution approach to the 
asymptotic analysis of queueing systems. Ann. Probab. 18 226-255. MR1043946 

[7] Dupuis, P. and Wang, H. (2004). Importance sampling, large deviations, and dif- 
ferential games. Stoch. Stoch. Rep. 76 481-508. MR2100018 

[8] Dupuis, P. and Wang, H. (2005). Dynamic importance sampling for uniformly re- 
current Markov chains. Ann. Appl. Probab. 15 1-38. MR2115034 

[9] Dupuis, P. and Wang, H. (2007). Subsolutions of an Isaacs equation and efficient 
schemes for importance sampling. Math. Oper. Res. To appear. 
[10] Glasserman, P. and Kou, S. (1995). Analysis of an importance sampling estimator 

for tandem queues. ACM Trans. Modeling Comp. Simulation. 4 22-42. 
[11] Heidelberger, P. (1995). Fast simulation of rare events in queueing and reliability 

models. ACM Trans. Modeling Comp. Simulation 4 43-85. 
[12] Lions, P.-L. (1985). Neumann type boundary conditions for Hamilton- Jacobi equa- 
tions. Duke Math. J. 52 793-820. MR0816386 
[13] Parekh, S. and Walrand, J. (1989). A quick simulation method for excessive back- 
logs in networks of queues. IEEE Trans. Automat. Control 34 54-66. MR0970932 



42 



P. DUPUIS, A. D. SEZER AND H. WANG 



[14] Sadowsky, J. S. (1991). Large deviations and efficient simulation of excessive 
backlogs in a GI/G/m queue. IEEE Trans. Automat. Control 36 1383-1394. 
MR1135652 

[15] Sezer, A. D. (2005). Dynamic importance sampling for queueing networks. Ph.D. 

dissertation, Brown Univ. 
[16] Weber, R. R. (1979). The interchangeability in -/M/l queues in series. J. Appl. 

Probab. 16 690-695. MR0540808 



P. Dupuis 
A. D. Sezer 
H. Wang 

Division of Applied Mathematics 
Brown University 
Providence, Rhode Island 02912 
USA 

E-MAIL: dupuis@cfm. brown. cdu 
sezer@cfm.brown.edu 
huiwang@cfm.brown.edu 



